Download Time-Scaling of Audio Signals with Multi-Scale Gabor Analysis
The phase vocoder is a standard frequency domain time-scaling technique suitable for polyphonic audio, but it generates annoying artifacts called phasiness, or loss of presence, and transient smearing, especially for high values of the time-scale parameter. In this paper, a new time-scaling algorithm for polyphonic audio signals is described. It uses a multi-scale Gabor analysis for lowfrequency content and a vocoder with phase-locking on transients for the residual signal and for high-frequency content. Compared to a phase-locking vocoder alone, our method significantly reduces both phasiness and transient smearing, especially for high values of the time-scale parameter. For time-contraction (time-scale parameters lower that one), the results seem to be more signaldependant.
Download The DESAM Toolbox: Spectral Analysis of Musical Audio
In this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different “mid-level” representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities.
Download Adjusting the Spectral Envelope Evolution of Transposed Sounds with Gabor Mask Prototypes
Audio-samplers often require to modify the pitch of recorded sounds in order to generate scales or chords. This article tackles the use of Gabor masks and their capacity to improve the perceptual realism of transposed notes obtained through the classical phasevocoder algorithm. Gabor masks can be seen as operators that allows the modification of time-dependent spectral content of sounds by modifying their time-frequency representation. The goal here is to restore a distribution of energy that is more in line with the physics of the structure that generated the original sound. The Gabor mask is elaborated using an estimation of the spectral envelope evolution in the time-frequency plane, and then applied to the modified Gabor transform. This operation turns the modified Gabor transform into another one which respects the estimated spectral envelope evolution, and therefore leads to a note that is more perceptually convincing.
Download Modal analysis of impact sounds with ESPRIT in Gabor transforms
Identifying the acoustical modes of a resonant object can be achieved by expanding a recorded impact sound in a sum of damped sinusoids. High-resolution methods, e.g. the ESPRIT algorithm, can be used, but the time-length of the signal often requires a sub-band decomposition. This ensures, thanks to sub-sampling, that the signal is analysed over a significant duration so that the damping coefficient of each mode is estimated properly, and that no frequency band is neglected. In this article, we show that the ESPRIT algorithm can be efficiently applied in a Gabor transform (similar to a sub-sampled short-time Fourier transform). The combined use of a time-frequency transform and a high-resolution analysis allows selective and sharp analysis over selected areas of the time-frequency plane. Finally, we show that this method produces high-quality resynthesized impact sounds which are perceptually very close to the original sounds.
Download Navigating in a Space of Synthesized Interaction-Sounds: Rubbing, Scratching and Rolling Sounds
In this paper, we investigate a control strategy of synthesized interaction-sounds. The framework of our research is based on the action/object paradigm that considers that sounds result from an action on an object. This paradigm presumes that there exists some sound invariants, i.e. perceptually relevant signal morphologies that carry information about the action or the object. Some of these auditory cues are considered for rubbing, scratching and rolling interactions. A generic sound synthesis model, allowing the production of these three types of interaction together with a control strategy of this model are detailed. The proposed control strategy allows the users to navigate continuously in an ”action space”, and to morph between interactions, e.g. from rubbing to rolling.
Download A Very Low Latency Pitch Tracker for Audio to MIDI Conversion
An algorithm for estimating the fundamental frequency of a singlepitch audio signal is described, for application to audio-to-MIDI conversion. In order to minimize latency, this method is based on the ESPRIT algorithm, together with a statistical model for partials frequencies. It is tested on real guitar recordings and compared to the YIN estimator. We show that, in this particular context, both methods exhibit a similar accuracy but the periodicity measure, used for note segmentation, is much more stable with the ESPRITbased algorithm. This allows to significantly reduce ghost notes. This method is also able to get very close to the theoretical minimum latency, i.e. the fundamental period of the lowest observable pitch. Furthermore, it appears that fast implementations can reach a reasonable complexity and could be compatible with real-time, although this is not tested is this study.
Download Sparse Decomposition of Audio Signals Using a Perceptual Measure of Distortion. Application to Lossy Audio Coding
State-of the art audio codecs use time-frequency transforms derived from cosine bases, followed by a quantification stage. The quantization steps are set according to perceptual considerations. In the last decade, several studies applied adaptive sparse time-frequency transforms to audio coding, e.g. on unions of cosine bases using a Matching-Pursuit-derived algorithm [1]. This was shown to significantly improve the coding efficiency. We propose another approach based on a variational algorithm, i.e. the optimization of a cost function taking into account both a perceptual distortion measure derived form a hearing model and a sparsity constraint, which favors the coding efficiency. In this early version, we show that, using a coding scheme without perceptual control of quantization, our method outperforms a codec from the literature with the same quantization scheme [1]. In future work, a more sophisticated quantization scheme would probably allow our method to challenge standard codecs e.g. AAC. Index Terms– Audio coding, Sparse approximation, Iterative thresholding algorithm, Perceptual model.